Molecular evolution and phylogenomic analysis of complete chloroplast genomes of Cotinus (Anacardiaceae)

Abstract Cotinus is an oligo‐specific ornamentally valuable genus with a disjunct distribution in the Northern Hemisphere. Traditionally, the taxonomy of Cotinus was mainly based on leaf morphological characteristics. However, the limited availability of genomic information greatly hindered the study of molecular evolution and phylogeny of this genus. This study sequenced the chloroplast (cp) genomes of all currently recognized taxa of Cotinus, including three species and four varieties. A comparative analysis was performed to investigate their cp genome characteristics and evolution. Furthermore, we inferred the phylogenetic relationships of Cotinus based on whole cp genomes, protein‐coding genes, and nuclear ITS data. All cp genomes exhibited a typical quadripartite structure with genome sizes ranging from 158,865 to 160,155 bp. A total of 113–114 genes were identified in the genomes. Seven non‐coding and four coding regions were identified as the most divergent hotspots for potential molecular barcodes and phylogenetic markers. Selection pressure analysis showed that there had been positive selection on genes matK and rps8 in the Cotinus cp genomes. Phylogenetic results confirmed that Cotinus is a monophyletic group but the widely distributed species Cotinus coggygria is not monophyletic. The divergence‐time analysis suggested that Cotinus underwent an evolutionary divergence from the middle Eocene and rapid adaptive radiation from the middle Miocene. This study revealed new insights into the cp genome evolution and phylogeny of Cotinus and related taxa.

tracts with medicinal properties (Matić et al., 2016). It is distributed from southern Europe into the Mediterranean region and, with discontinuities, across the continent through the Himalayas and into China (Gavinet et al., 2019;Miao et al., 2017). Cotinus coggygria is a large shrub or small tree with a variable habitat and diverse leaf morphology. The leaves are 3-8 × 3-6 cm in size, broadly elliptic, suborbicular, obovate, or ovate in outline, glabrous or pubescent.
Due to the great variability of leaf morphological characteristics, C. coggygria was generally further divided into four varieties: C. coggygria var. coggygria, C. coggygria var. cinerea Engl., C. coggygria var.
pubescens Engl., and C. coggygria var. glaucophylla C. Y. Wu (Cheng & Min, 1980;Min & Barfod, 2008). The four varieties are more or less different in leaf shape and pubescence (Min & Barfod, 2008; also see Figure 1). However, whether these varieties are established or monophyletic needs further study. Cotinus obovatus is restricted to the southeastern United States and naturally grows on rocky, calcareous soils in a few hilly and mountainous areas. Its leaves are obovate in outline ( Figure 1a) and can be 13 cm in length, generally larger and coarser than those of C. coggygria (Hoagland, 2004). The other two species, C. szechuanensis and C. nana, are both restricted to the narrow areas of Southwest China. Cotinus szechuanensis is characterized by its relatively small leaves, 2-6 × 2-5 cm in size, almost round in outline with a wavy margin, glabrous adaxially but abaxially with conspicuous tufts of hair in vein axils ( Figure 1b). Its native habitat in the northwestern Sichuan province is usually a small shrub in dry, high-elevation mountainous regions (Li et al., 2007). Cotinus nana is found only in the dry and open areas of hill thickets and grasslands in the southwestern Yunnan province (Min, 1979). This species is a low, compact shrub with a height less than 1 m. Its leaves are leathery and small (0.5-2 cm in diameter), almost round in outline, glabrous both adaxially and abaxially (Figure 1c). The delimitation of species by leaf morphology has been controversial because these traits are easily affected by the environment, and convergence and parallel evolution often occur (Singh, 2010). Therefore, using phenotypic traits alone to infer phylogenetic relationships between taxa with different genotypes is problematic, especially within such an inherently variable genus. Although molecular sequence data have been widely used as phylogenetic tools for decades, there are few molecular studies devoted to establishing relationships in Cotinus.
Using more molecular data is essential to infer the phylogenetic relationships of Cotinus, especially from a genomic perspective.
Chloroplast (cp) genomes are circular DNA molecules composed of a large single copy (LSC) region and a small single copy (SSC) region separated by a pair of inverted repeats (IR) regions (Sugiura, 1992).
Cp genome is independent of the nuclear genome and has highly conserved gene content, number, and structure (Daniell et al., 2016).
Over the past years, it has been used to elucidate plant molecular evolution, enhancing our understanding of chloroplast biology, conservation, and diversity (Daniell et al., 2016). Moreover, cp genomes represent rich information sources for phylogenomics, from higherlevel to below the species level (Barrett, 2020;Dong et al., 2021Dong et al., , 2022Ruhfel et al., 2014). To date, the cp genome is available for only one representative Cotinus species. Xu et al. (2021) reported the complete cp genome of C. coggygria and conducted a comparative analysis of some other genera in Anacardiaceae.
Here, we sequenced and assembled the cp genomes of all currently recognized species and varieties in Cotinus. A comparative analysis of the complete cp genomes was conducted to quantify the cp genomic variation among the species of Cotinus. Furthermore, a phylogenomic analysis was performed based on whole cp genomes to infer the phylogenetic position of Cotinus in Anacardiaceae and the phylogeny among the species. In addition, the nuclear ITS sequences were used to construct the phylogeny to observe the phylogenetic congruence or incongruence between nuclear and cp genome data. Specifically, we attempted to (1) reveal the cp genome characteristics, and divergence patterns of Cotinus; (2) provide useful cp genomic resources for molecular barcodes and phylogenetically informative markers; and (3) improve our understanding of the evolutionary history and infrageneric relationships of Cotinus.

| Plant material, DNA extraction and genome assembly
Ten accessions representing three species and four varieties of Cotinus were sampled and sequenced. The fresh leaves of C. nana, C. szechuanensis, C. coggygria var. cinerea, C. coggygria var. glaucophylla and C. coggygria var. pubescens were collected from the field and dried with silica gel. Voucher specimens were deposited at the Beijing Forestry University Herbarium (BJFC), Beijing. The dry leaves of C. coggygria var. coggygria were sampled from the specimen in the Beijing Forestry University Herbarium. The dry leaves of C. obovatus was provided by Prof. Libing Zhang from Missouri Botanical Garden, USA. Details of the sampling information were shown in Table A1.
Total genomic DNA was extracted from silica-dried leaves using a modified CTAB protocol (Doyle & Doyle, 1987) and purified employing the Wizard DNA Clean-Up System. DNA samples were fragmented randomly and then were sheared into 350 bp fragments through agarose gel electrophoresis. The bp insert size paired-end libraries with 350 were built using the Illumina PE DNA library kit, and then paired reads were sequenced with an Illumina HiSeq 4000.
The cp genome assembly were performed using GetOrganelle v1.6.2d (Jin et al., 2020). The GetOrganelle first filtered plastid-like reads, conducted the de novo assembly, purified the assembly, and finally generated the complete cp genomes (Bankevich et al., 2012;Camacho et al., 2009;Langmead & Salzberg, 2012). K-mer gradients for a mean and maximum 150 bp reads were set to "-k 21, 45, 65, 85,105" for all taxa. Bandage (Wick et al., 2015) was used to visualize the final assembly graphs to authenticate the automatically generated cp genome. Gene annotation was conducted using Plann (Huang & Cronk, 2015) with the annotated cp genome of Pistacia chinensis (NC_046786.1) as the reference genome. These cp genome sequences were deposited in GenBank (ON167899-ON167909; Table A1). The nuclear ITS sequences were assembled from the clean data of same samples using GetOrganelle as the cp genome assembly.
Drawing the IR/SC boundaries regions map display the IR expansion and contraction among Cotinus cp genomes. The mVISTA program (http://genome.lbl.gov/vista/ mvist a/about.shtml ;Frazer et al., 2004) was performed using the annotation of C. coggygria var. cinerea as the reference to analyze genomic structural variation. Visual results from mVISTA were further analyzed using the alignment model LAGAN, which producing true multiple alignments regardless of whether they contain inversions (Brudno et al., 2003).

| Nucleotide divergence and selection pressure estimation
To screen polymorphic hotspots that can be used as molecular markers for species identification of Cotinus, 78 shared coding regions (CDS) and 103 intergenic spacer regions (IGS) of the seven cp genomes were separately extracted, aligned using MAFFT (Katoh et al., 2019), and manually adjusted using MEGA 7 (Kumar et al., 2016). Subsequently, the aligned sequence length, number of variable sites, and parsimony-informative characters (PICs) for each region were calculated using DnaSP 5.0 (Librado & Rozas, 2009). We used two methods to identify mutation hotspots. The first was based on the percentage of variable sites, which calculated all the single-nucleotide substitutions and included the most divergence markers. The second was based on the percentage of PICs, which included the most informative markers. To estimate selection pressures, non-synonymous (Ka) and synonymous (Ks) substitution rate ratios (ω = Ka/Ks) of the 78 protein-coding genes were calculated using the CodeMl program in the PAML v4.9 Xu & Yang, 2013). We applied three site-evolution models to test whether some sites (codons) are under positive selection: the M0 (one ratio), M7 (β) and M8 (β and ω). Likelihood ratio tests (LRTs) with p-value < .05 were performed to test the statistical significance.

| Phylogenetic analyses
To assess the phylogenetic relationships of Cotinus, we established three datasets and used Maximum likelihood (ML) and Bayesian Inference (BI) methods for phylogenetic analysis. The first dataset WCG included 25 whole cp genomes from Anacardiaceae and two outgroup genomes from Burseraceae (10 newly generated and 17 obtained from GenBank; Table A2). The second dataset PCG comprised 76 protein-coding genes (excluding two genes under positive selection) in the cp genomes. The third ITS dataset included 24 nuclear ITS sequences (10 newly created with the cp genome and 14 obtained from GenBank; Table A2). All the sequences were aligned with MAFFT (Katoh et al., 2019), and ambiguous alignment regions were trimmed by Geneious 2021 with default parameters (Sites containing: Gaps = 20%; Ripma et al., 2014).
The best-fitting nucleotide substitution model was determined using the ModelFinder (Kalyaanamoorthy et al., 2017). The ML trees were used RAxML v.8.2.12 (Stamatakis, 2014) software, and the reliability of the phylogenetic tree topology adopts 1000 self-expanding times analysis and inspection (bootstrap support values, BS). The BI tree was generated using the MrBayes 3.2.6 software (Ronquist et al., 2012), based on Markov Chain Monte Carlo (MCMC) process. Four chains were run in parallel for 1,000,000 generations, sampling once every 1,000 generations, discarding the first 25% of the trees, and verifying them by Bayesian posterior probabilities (PP).

| Dating divergence times
We estimated the divergence times using relaxed-clock method implemented in the BEAST version 1.8.2 (Drummond et al., 2012) and the whole cp genomes dataset. Three fossils were used as a priori to calibrate the divergence time. Wood fossils related to Anacardiaceae and Burseraceae were reported from the upper Cretaceous of Mexico (Estrada-Ruiz et al., 2010). We set the root age (the divergence time between Anacardiaceae and Burseraceae) to be 70 million years ago (Ma). This age was consistent with the hypothetical diversification time of Burseraceae (De-Nova et al., 2012). The reliable fruit fossils of Rhus were found in the middle Eocene of western North America (Manchester, 1994).
The minimum stem age of Rhus was calibrated to be 44 Ma. The oldest fossil records of Pistacia were found in the Oligocene of Europe and North America (Ramirez & Cevallos-Ferriz, 2002), and this fossil was used to calibrate the minimum stem age of Pistacia to be 33 Ma. These three fossil priors were given lognormal prior distributions with an offset of the fossil ages, a mean of 1.5 and a standard deviation of 1, allowing for the possibility that these nodes are considerably older than the fossils themselves.
BEAUti v2.4.7 was used to create the input file for Beast. The GTR + G alternative model and the relaxed log-normal clock model were set to run for a total of 40 million generations, sampling every 1,000 generations, ensuring that the parameters converged in Tracer v.1.7 (Rambaut et al., 2018) with valid sample size values >200. The first 20% of the resulting trees set were rounded off and the remainder were used to construct the Maximum Clade Credibility (MCC) tree in TreeAnnotator v.1.10.4.

| Chloroplast genome characteristics of Cotinus
We assembled and compared 10 whole cp genomes from three species and four varieties of Cotinus. All cp genomes were typical covalently closed, double-stranded circular molecules that contained an LSC region of 87,325-87,605 bp, an SSC region of 18,135-19,445 bp, and a pair of IR regions of 26,306-26,832 bp ( Figure 2 and Table 1).
The cp genome sizes ranged from 158,865 bp in C. coggygria var.

| IR contraction and expansion
To elucidate the putative contraction and expansion of the IR regions in Cotinus cp genomes, we investigated the gene variation at the IR/SC boundary regions (Figure 3). In the LSC/IRb border, the rpl2 gene in the LSC region was all expanded completely to the IRb region. In the IRb/SSC border, the ndhF gene all crossed the junctions and extended into the IRb region, with a length from 33 to 34 bp. In the SSC/IRa border, the junctions were all crossed by the The LSC/IRb junctions were located between rpl22 and trnH in C. coggygria var. pubescens, and between rps22 and rpl2 in C. obovatus, because the rps19 gene was lost in these two cp genomes.

| Sequence divergence and mutation hotspot regions
To quantify sequence divergence, the whole cp genomes from the seven taxa of Cotinus were compared with mVISTA. The alignments revealed high sequence similarity across these cp genomes ( Figure 4). The loss of rpl32 gene led to the gaps in the alignments F I G U R E 2 Gene map of Cotinus cp genomes. Genes shown inside the inner circle are transcribed counterclockwise and those outside are transcribed clockwise. Genes belonging to different functional groups are colour-coded. Darker grey in the inner circle corresponds to the GC content of the cp genome. for C. szechuanensis and C. coggygria var. glaucophylla, and the loss of rps19 gene led to the gaps for C. obovatus and C. coggygria var.
pubescens. The average variable percentages of the whole genome, coding and non-coding regions were 0.88%, 0.59%, and 1.14%, respectively, with the highest percentage in the SSC region (1.40%), followed by the LSC region (1.27%), and then the IR region (only 0.10%; Table 3). The PICs percentages of the whole genome, coding and non-coding regions were 0.11%, 0.04%, and 0.21%, respectively. The SSC region had the highest PICs percentage (0.16%), followed by the LSC region (0.14%), and the IR region (only 0.01%; Table 3).
To identify the mutation hotspot regions, the percentages of variable sites and PICs were calculated in each coding and noncoding region ( Figure 5 and Table 3). We defined a mutation hotspot region as having a percentage of variable sites >5% for non-coding regions and >3% for coding regions. Seven non-coding genes (atpA-atpF, rps16-trnQ UUG , and trnK UUU -rps16) and four coding genes (rps18, rpl22, rpl32 and rpl36) were identified as the most divergent hotspot regions. These hotspot regions were all located in the LSC region except for rpl32 located in the SSC region. Concerning PICs, gene regions atpA-atpF (7.02%) and atpF-atpH (4.15%) had higher PICs percentages in noncoding regions, while gene rpl22 had a higher percentage (1.54%) in coding regions.

| Selection pressure analysis
Using the site-evolution models of CodeMl, non-synonymous to synonymous nucleotide substitution rate ratios (Ka/Ks) for 78 shared protein-coding genes were calculated ( Figure 6). Among these genes, the Ka/Ks ratios had a range from 0 to 1.5070 with an average value of 0.2471. The Ka/Ks ratios showed eight genes (atpE, ndhG, petG, psaI, psbH, psbK, ycf3, and ycf15)

| Phylogenetic analysis
The phylogenomic analyses were conducted based on WCG and PCG datasets (Figure 7) cinerea. There were high support values for the second clade in both WCG and PCG trees (PP/BS: 1/100 and PP/BS: 1/98, respectively).

F I G U R E 3
Comparison of the borders of large single copy (LSC), small single copy (SSC), and inverted repeats (IR) regions in cp genomes of seven Cotinus taxa. Different colour boxes indicate specific genes.
The ITS dataset also supported that all the taxa of Cotinus formed a monophyletic group but most nodes were lower supported within Cotinus (Figure 8). In addition, extensive incongruences between cp genome and nuclear data occurred in both species and varieties levels. For example, ITS data showed that C. obovatus was the firstly diverged group in Cotinus.

| Divergence time dating
Based on whole cp genomes data, the mean estimated ages and 95% HPD (highest posterior density) intervals of nodes were mapped onto the dating chronogram ( Figure 9) Barkley (37.96%;Wang et al., 2020;Zheng et al., 2017). The average GC contents of the IR regions were higher than in the LSC and SSC regions, indicating that rRNA and tRNA, found mainly in IR regions, prefer to use G and C bases (Li et al., 2016).
The contraction and expansion of IR generally lead to gene order change in the cp genomes of angiosperms and play important roles in genome evolution (Wicke et al., 2011). The inclusion of three species and four varieties in Cotinus showed that their IR boundaries were variable at and below the species levels. Large IR expansions in these cp genomes caused the rpl2 gene to transform to the IRb region completely, and the trnH gene transformed to the IRa region to various degrees. IR expansion also led to the partial duplication of ycf1 in the IRb region, thereby generating a ycf1 pseudogene located in the IRb/SSC border.
Gene loss often occurs during the evolution of angiosperms cp genomes (Daniell et al., 2016;Jansen et al., 2008). It has been proposed that contraction/expansion of the IR regions could lead to gene loss/addition (Wicke et al., 2011). Among the examined cp genomes, the rpl32 gene was lost in C. szechuanensis and C. coggygria var. glaucophylla, and the rps19 gene was missing in C. obovatus and C. coggygria var. pubescens, indicating that gene deletion occurred during cp evolution. The missing rps19 gene from the cp genomes was also reported in some species in Rhus (Barrett, 2020). Some lost genes were proved to be transferred to the nuclear or mitochondrial genomes, where they may still be expressed (Daniell et al., 2016).
For example, the rpl32 gene transfers to the nucleus in Populus and Thalictrum (Park et al., 2015;Ueda et al., 2007). It is unknown whether the loss of rps19 and rpl32 from the cp genomes of Cotinus species is due to gene transfer to the nucleus/mitochondrion or loss from the cell entirely.
Mutation hotspots can provide valuable variable sequence information for species identification and phylogeny research (Dong et al., 2021(Dong et al., , 2022. Most sequence mutation hotspots occurred in the non-coding regions across the cp genomes of the seven Cotinus taxa, indicating the coding regions were more conserved than the non-coding ones, consistent with most angiosperm cp genomes (Daniell et al., 2016). We identified seven non-coding and four coding regions as the most divergent hotspots in Cotinus. All hotspots are located in the LSC and SSC regions and the IR regions showed higher conservation than the LSC and SSC regions. These divergent regions may be undergoing rapid nucleotide substitution, indicating great potential molecular markers for phylogeny and evolutionary research in Cotinus.
Mutations accompany species adaptations to diverse environments under positive selection (Dong et al., 2018;Yin et al., 2018). suggesting that the two genes may have undergone positive selective pressures. The matK gene is located within the trnK intron, and functionally may be involved in splicing group II introns coding for tRNALys (Crawley & Hilu, 2012). Due to the unknown function, rps8 is a valuable resource for future research of the adaptive evolution of Cotinus.
In our analyses, two individuals of C. nana, C. szechuanensis and C. coggygria var. cinerea from different localities clustered with extremely short terminal branches, indicating that they were highly homologous respectively. On the other hand, the phylogenetic trees had very short internode connected by long branches at the interspecific nodes, suggesting that Cotinus may have undergone rapid speciation and adaptive radiation during the evolution process.
Phylogenetic reconstruction may be difficult due to hybridization/ introgression, incomplete lineage sorting (ILS), and reticulation evolution. This is especially true for the groups in which speciation was accompanied by rapid adaptive radiation (Azarin et al., 2021;Meng et al., 2021;Rosenberg & Nordborg, 2002). In our WCG tree, the phylogenetic relationships among C. coggygria var. coggygria, C.
obovatus, C. coggygria var. pubescens and C. nana were still not fully resolved due to moderate support values. Moreover, there was incongruence concerning the relationships of these four taxa between WCG, PCG and ITS data. This is probably due to their rapid radiation or reticulate evolution during the period of the late Miocene.

| CON CLUS IONS
The present study investigated the characteristics and evolution of the cp genomes from three species and four varieties in Cotinus, representing all currently recognized taxa of this genus. Comparing these genomes revealed that the genome structure and gene content were fairly conserved among these taxa. However, the IR boundaries were variable at both species and varieties levels, which was related to IR expansion and gene loss. The sequence divergence and mutation hotspots of these cp genomes were analyzed. Seven noncoding and four coding regions had the highest divergence and may be considered useful molecular barcodes and phylogenetic markers. Selection pressure analysis showed that there had been significant positive selection on matK and rps8 genes in the cp genomes of Cotinus. The phylogenetic analysis based on whole cp genomes, protein-coding genes and nuclear ITS data confirmed that the genus Cotinus is a monophyletic group but the widely distributed species C. coggygria is not monophyletic. In addition, the divergence-time analysis revealed evolutionary radiation of Cotinus from the middle Miocene. This study revealed new insights into the cp genome evolution and phylogeny of Cotinus and related taxa.

ACK N OWLED G M ENTS
We thank Prof. Libing Zhang (Missouri Botanical Garden) for providing leaf material of Cotinus obovatus. We also would like to thank Dr. Ce Sang and Dr. Yachao Wang for their kind help in samples collection.

This research was supported by Science and Technology Basic
Resources Investigation Program of China "Survey and Germplasm

Conservation of Plant Species with Extremely Small Populations in
South-west China" (Grant No. 2017FY100100).

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare that there is no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The 10 newly sequenced chloroplast genomes were submitted to the NCBI database (https://www.ncbi.nlm.nih.gov/) with GenBank accession numbers ON167899-ON167909.